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Abstract. Closure system on a finite set is a unifying concept in logic pro- 
gramming, relational data bases and knowledge systems. It can also be pre- 
sented in the terms of finite lattices, and the tools of economic description of a 
finite lattice have long existed in lattice theory. We present this approach by 
describing the so-called D-basis and introducing the concept of ordered direct 
basis of an implicational system. A direct basis of a closure operator, or an 
implicational system, is a set of implications that allows one to compute the 
closure of an arbitrary set by a single iteration. This property is preserved by 
the D-basis at the cost of following a prescribed order in which implications 
will be attended. In particular, using an ordered direct basis allows to opti- 
mize the forward chaining procedure in logic programming that uses the Horn 
fragment of propositional logic. One can extract the D-basis from any direct 
unit basis S in time polynomial in the size and it takes only linear time 

of the cardinality of the D-basis to put it into a proper order. We produce 
examples of closure systems on a 6-element set, for which the canonical basis 
of Duquenne and Guigues is not ordered direct. 



1. Introduction 

In K. Bertet and B. Monjardet [5^, it is shown that five imphcational bases 
for a closure operator on a finite set, found in various contexts in the literature, 
are actually the same. The goal of this paper is to demonstrate that standard 
lattice-theoretic results about the "most economical way" to describe the structure 
of a finite lattice may be transformed into a basis for a closure system naturally 
associated with that lattice. 

The coding of a finite lattice in the form of a so-called OZ?-graph was first 
suggested in [16] . We will call the basis directly following from this Oi^-graph a D- 
basis, since it is closely associated with a D-relation on the set of join-irreducibles 
of a lattice (not necessarily finite) that was crucial in the studies of free and lower 
bounded lattices, see [S]. The definition and the proof that D-basis does define a 
given closure system are given in section [4j 

The Z?-basis is a subset of a so-called dependence relation basis (Definition 6 in 
[5]). Thus, it is also a subset of the canonical direct unit basis that unifies the five 
bases discussed in [5] . In section [Sj we give an example to demonstrate that the 
reverse inclusion does not hold, thus showing that this newly introduced -D-basis is 
generally shorter than the existing ones. 
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Recall that the main desirable feature of bases from 5 is that they be direct, 
which means that the computation of the closure of any subset can be done by 
attending each implication from the basis only once. This makes the computation 
of closures a one-iteration process. 

While the Z?-basis is not direct in this meaning of this term, the closures can 
still be computed in a single iteration of the basis, provided the basis was put in a 
specific order prior to computation. Moreover, there is a simple and effective linear 
time algorithm for ordering a D-basis appropriately. Thus, applying the I?-basis 
can be compared to the iteration known in artificial intelligence as the forward 
chaining algorithm, see for example [12^. 

We introduce the definition of ordered iteration and ordered direct basis in sec- 
tion [6j where we also prove that the Z?-basis is ordered direct and discuss the 
algorithmic aspects of ordering it. The further directions of optimization of D-basis 
are outlined in section [8] where we also introduce the notion of an ordered direct 
sequence built from a given basis of a closure system. 

In section |9j we also discuss the so-called i?- relation, introduced in 9J, which 
leads to the definition of the _E-basis in closure systems without D- cycles. In general, 
the implications written from the i?-relation do not necessarily form a basis of a 
closure system, but in closure systems without ZJ-cycles, the iJ-basis is ordered 
direct, is contained in the D-basis, and often shorter than the D-basis. We discuss 
a polynomial time algorithm for ordering the i5-basis. 

We explore the connections between Z?-basis, D-basis and the so-called canonical 
basis introduced by Duquenne and Guigues in While the canonical basis has 
the minimal number of implications among all the bases of a closure system, it does 
not have the feature of D-basis or D-basis discussed in this paper, namely, it cannot 
be turned into an ordered direct basis. Section 10 of our paper presents examples of 
closure systems on a 6-element set, for which the canonical basis cannot be ordered. 
As a result, the time required for one iteration of D-basis wins over at least two 
iterations of the canonical basis. Further polynomial-time optimizations of both 
D-basis and the canonical basis are discussed. 

Section [7] is devoted to discussion and testing the forward chaining algorithm in 
comparison to the ordered direct basis algorithm. Section 11 provides test results 
comparing the performance of the D-basis with the Duquenne-Guigues canonical 
basis and canonical direct unit basis. 

The next two sections contain the required definitions and establish connections 
between finite lattices, closure operators, implicational systems, Horn formulas and 
Horn Boolean functions. The reader may consult the survey ^4t^ for various aspects 
of closure systems on finite sets. 



2. Lattices and closure operators 

By a lattice, one means an algebra with two binary operations A, V, called meet 
and join, respectively. Both operations are idempotent and symmetric and are 
connected by absorbtion laws: x \/ {x A y) = x and x A {x \/ y) = x. These laws 
allow us to define a partially order on the base set of the lattice: x^yiSxAy = x 
(which is equivalent to xM y = y). Vice versa, every partially ordered set, where 
every two elements have a least upper bound and a greatest lower bound, is, in 
effect, a lattice. Indeed, in this case the operation V can be defined as the least 
upper bound, and A as the greatest lower bound of two elements in the poset. A 
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lattice is finite when the base set of this algebra is finite. The symbols AiV ^-re 
used when more than two elements meet or join. We will use the notation for the 
least element of a lattice, and 1 for its greatest element. If a ^ 6 in lattice L, then 
we denote by [a, b] the interval in L, i.e., the set of all c satisfying a ^ c ^ b. 

Recall now the standard connection between a closure operator on a set and the 
lattice of its closed sets. Given a non-empty set S and the set P{S) — 2^ of all its 
subsets, a closure operator is a map (j) : P{S) — P{S) that satisfies the following, 
for all X,Y e P{S): 

(1) increasing: X C (j){X); 

(2) isotone: X CY imphes (j){X) C 0(y); 

(3) idempotent: 0(0(X)) = (j){X). 

It would be convenient for us to refer to the pair {S, (f>) of a set S and a closure 
operator on it as a closure system. 

A subset X C S is called closed if (j>(X) — X. The collection of closed subsets of 
closure operator (j) on S forms a lattice, which is usually called the closure lattice 
of the closure system {S, (p). This paper deals with only finite closure systems and 
finite lattices. 

Conversely, we can associate with every finite lattice L a particular closure system 
(5, (j)) in such a way that L is isomorphic to a closure lattice of that closure system. 
Consider J{L) C L, a subset of join-irreducible elements. An element j € L is 
called join-irreducible, if j ^ 0, and j = aM b implies a = j or b = j. We define a 
closure system with S = J{L) and the following closure operator: 

^{x)^[o,\/x]nJ{L) 

It is straightforward to check that the closure lattice of (j) is isomorphic to L. 

Example 1. Consider a simple example illustrating a closure system built from the 
lattice L = {0, a, &, c, 1}, for which 0<a<6<l, 0<c<l, aVc = &Vc=l and 
aAc = bAc — 0. Then S — J(P) — {a, 6, c}. The closed subsets are [0, x] D J{L) 
for X d L, which are 0, {a}, {c},{a,b} and {a, 6, c}. Knowing all closed subsets, 
one can define a closure of X, or (l){X), as the smallest closed set containing X. 
For example, </>({&}) = {«,&}• 

There are infinitely many sets and closure operators whose closure lattice is 
isomorphic to a given L. On the other hand, the one just described is the unique 
one with two additional properties: 

(1) m = 0; 

(2) 0({i}) \ {«} is closed, for every i £ S. 

Condition (2) just says that each (f>{{i}) is join irreducible. Note that (1) is a special 
case of (2), and that (2) implies the property 

(3) (j>i{i}) = <p{{j}) implies i = j, for any i,j e S. 

Note that Uiex^^'f^J') — 't'i-^)^ inverse inclusion does not necessarily 

hold. In Example [ll for instance, (j){{a}) U (/'({c}) C 0({a, c}), since b belongs to 
the right side and not to the left side. 

We will call a closure system with properties (1), (2) above a standard closure 
system. Closure systems with (2) are called {T^) closure spaces in Wild 17J. 

A closure system satisfying property (3) is said to be reduced. Note that (3) 
implies |9!>(0)| ^ 1. Reduced closure systems correspond to a representation of a 
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lattice L as a closure system on a set S with J{L) C 5 C L and (piX) — [0, \/ X]r\S. 
A natural example is the set of principal congruences in the congruence lattice of 
a finite algebra. Every standard closure system is reduced, and reduced closure 
systems form a useful intermediate ground between standard and general systems. 

It is straightforward to verify that the standard system is characterized by the 
property that the set S is of the smallest possible size. In other words, one cannot 
reduce S to define an equivalent closure system. On the other hand, the reduced 
systems might have excessive elements in S. 

Example 2. Consider again lattice L = {0,a, 6, c, 1} from Example^ It will 
represent the closure lattice on Si — {a, b, c, d}, where the closed sets are 0, {a}, 
{c},{a, b} and {a, 5, c, d}. Thus, in this representation J{L) C 5*1, and property (2) 
fails: (j){{d}) \ {d} = {a, 6,c} is not closed. On the other hand, property (3) holds, 
thus, it is a reduced closure system. Apparently, Si can he reduce by element d, to 
get an equivalent representation of Example [7| 

If the closure system {S, (j)) is not reduced, one can modify it to produce an 
equivalent one that is reduced. Moreover, there is an effective algorithm for doing 
so. Thus, for all practical purposes, one can work with a reduced closure system 
([/, /i) replacing a given one {S, (p) . Slightly more effort yields an equivalent standard 
closure system {V,v). The transition is described as follows. 

If 0(0) — A C in (5, (/)), then define T = S\A, and redefine a closure operator: 
t{Y) = 4>{Y) \ A, for aU Y CT. The closure system (T, r) satisfies property (1). 
As (1) is required for a standard closure system, but not for a reduced system, this 
step may be omitted if only the latter is sought. 

Next define an equivalence relation on T by x « y if and only if t{x) = T{y). 
Then factor out sa, letting U = T/ ^ and n{Y) = t(Y)/ « for F C U. Alternately, 
we could define U to be a set of representatives for T/ w and fi to be the restriction 
of T. Either way, one easily checks that fi is a well-defined closure operator on U, 
and that the closure lattice of (C/, /i) is isomorphic to that of {S,(p). At this point, 
([/, /i) is reduced. Moreover, we can recover the original system (5, (j)) by expanding 
the equivalence classes and adding back in (/'(0). If desired, we can now continue to 
produce an equivalent standard closure system. 

Let V = {u ^ U : \ {u} is closed}, that is, u ^ \ {u}), and 

for Z C ]/ let iy'{Z) = n{Z) n V. It is straightforward to verify that (V, i/) is a 
closure system satisfying (1) and (2), and that the lattice of closed sets of {V, v) is 
isomorphic to that of {U,v) . 

For the sequel, we will consider primarily reduced closure systems. Given an 
arbitrary closure system, not necessarily reduced, the above reduction can be con- 
sidered as a setup process to allow us to apply the D-basis and related methods. 

3. The bases of closure systems, Horn formulas and Horn Boolean 

functions 

If y e 4>{X), then this relation between an element y ^ S and a subset X C S 
in a closure system can be written in the form of implication: X ^ y. Thus, the 
closure system {S, (p) can be replaced by the set of implications: 



= {X ^ y : y e 5, X C 5 and y e (j){X)) 
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Conversely, any set of implications E defines a closure system: the closed sets are 
exactly subsets Y C S that respect the implications from E, i.e., if X — )■ y is in E, 
and X CY, then y €Y. 

It is convenient to define an implication X ^ y as any ordered pair {X, y) , 
X Q S , y £ S , especially having in mind its interpretation as a propositional 
formula, see this section two paragraphs below. On the other hand, from the point 
of view of closure systems, any single implication X — >■ a;, with a; S X, defines 
a trivial closure system, where all subsets of S are closed. If such implication is 
present in the set of implications E, then it can be removed without any change 
to the family of closed sets that E defines. We will assume throughout the paper 
that implications X — >■ a;, where x e X, are not included in the set of implications 
defining closure systems. 

Two sets of implications E and E' on the same set S are called equivalent^ if they 
define the same closure system on S. The term basis is used for a set of implications 
E' satisfying some minimality condition; thus there may be different types of bases. 

Note that, in general, one can consider implications of the form X ^ Y , where Y 
is not necessarily a one-element subset of S. Following [5], we will call basis E a unit 
implicational basis if |y| = 1 for all implications X — >■ y in E. We will mostly be 
concerned with unit implicational bases, except for the discussion of the canonical 
basis of Duquenne-Guigues and its comparison with ZJ-basis and i?-basis. Given 
any unit basis, we can always collapse the implications with the same premise into 
one with all conclusions combined into a single set. This will be called an aggregated 
basis. 

For a set of implications E = {Xi — Yi, . . . ,Xm ^m}, define the size by 
s(^) = SjLid^il + I^ D- This is one convenient measure of the complexity of an 
implicational system. 

In general, implications X ^ y, where X Q S and y £ S, can be treated as the 
formulas of propositional logic over the set of variables S, equivalent to yy\l 
Formulae of this form are also called definite Horn clauses. More generally, Horn 
clauses are disjunctions of negations of several literals and at most one positive 
literal. The presence of a positive literal makes a Horn clause definite. A Horn 
formula is a conjunction of Horn clauses. 

What is called a model of a definite Horn clause in logic programming literature 
corresponds to a closed set of the closure operator defined by this clause. Indeed, 
by the definition, a model of any formula is simply a tuple m e 2^ of zeros and ones 
assigned to literals from 5, such that the formula is true (=1) on this assignment. 
For the definite Horn clause X — >■ y, m corresponds to a subset F of S* that is closed 
for a closure operator on S defined by X — )• y. In fact, m is just the characteristic 
function of Y . 

There is also a direct correspondence between Horn formulas and Horn Boolean 
functions: a Boolean function / : {0, 1}" {0, 1} is called a {pure or definite) Horn 
function, if it has some GNF representation given by a (definite) Horn formula E. 
The dual definition is sometimes used in the literature, so that a Horn function 
is given by some formula in DNF, whose negation is a Horn formula [6]. Using 
either definition, one can translate many results on Horn Boolean functions to the 
language of closure operators, see more details in 

Consider a set E of Horn clauses over some finite set of literals S — {xi, . . . , a;„}. 
If some Horn clause a in E is not definite, i.e., is of the form \J X (Z and 
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it does not use all literals from 5, then we could define the set of definite clauses 
= {X y : y E S \ X}. It is easy to observe that the set of models of Sc, 
consists of all models of V^eJC ^^'^ '^^^ additional model, which is a tuple of all 
ones, representing the set S itself. If some clause /3 e S is not definite and uses all 
the literals from S, then we define ~ (another possibility, xi — )■ xi). Again, 
the set of models i.e., all tuples of zeros and ones, extends the models of /3 by 
single tuple of all ones. It follows that the set of definite clauses S', where each 
non-definite clause a from E is replaced by a set of clauses S^, has the set of models 
that extends the set of models of S by a single tuple of all ones. This includes the 
case when S has no models, i.e., when it is inconsistent. 

This observation allows us to reduce the solution of various questions about sets 
of Horn clauses to sets of definite Horn clauses. Thus, it emphasizes the importance 
of the study of closure operators on S. 

One of the important questions in logic programming is whether one clause 4> 
is a consequence of the set (or conjunction) of clauses S. Denoted by S |= 0, this 
means that every model of S is also a model of (j). If (jj and formulas in E are 
Horn clauses, then, translating this question to the language of closure systems, 
one reduces it to checking whether every closed set of a closure system defined by 
S respects (j). 

4. The D-basis 

In this section we are going to define a basis that translates to the language 
of closure systems the defining relations of a finite lattice developed in the lattice 
theory framework. One can consult [3] for the corresponding notion of a minimal 
cover and Z?-relation used in the theory of free lattices and lower bounded lattices. 

Given a reduced closure system {S,(f)), let us define two auxiliary relations. The 
first relation is between the subsets of S: we write X <^ Y, if for every x G X 
there is y &Y satisfying x S 0(y). In Example [l| for instance, we have {a} <C {b}, 
{a, c} ^ {b, c} and {c} ^ {a, c}. Note that X CY implies X <^Y. We also write 
X r, if X < r and y < X. This is true for X = {a, b, c} and Y = {6, c} in 
Example [l] 

Several observations are easy. 

Lemma 3. The relation <C is a quasi-order, and thus is an equivalence relation 
on P[S). 

We will denote a ^<g-equivalence class containing X by [X]. Note that for any 
two members X, F G [X], we have (j){X) = (j^iY). Indeed, X < F implies X C (j)[Y) 
and (t>{X) C (t>iY). Inverse inclusion follows from Y <^ X. 

There is a natural order on ^<g-classes: [X] \Y] ii X <^Y . 

Lemma 4. The relation is a partial order on the set of ^^-equivalence classes. 

Each class [X] is ordered itself with respect to set containment. 

In Example [ij we have that {a, 6, c} '^<^ c}, and no more subsets are 
equivalcnt to {a, 6, c}. Thus, [{&, c}] consists of two subsets, and {6, c} C {a, 6, c} is 
the minimal (with respect to the order of containment) subset in that equivalence 
class of ^<g. Also {a, c} ^ {fo, c}, whence [{a,c}] [{^jc}]. 

Lemma 5. // {S, cj)) is reduced, then each equivalence class [X\ has a unique min- 
imal element with respect to the containment order. 
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Proof. Let us assume that there are two minimal members Xi and X2 in [X]. 
Without loss of generality we assume that there is x G Xi \ X2 ■ Since Xi ^ X2 <C 
Xi, we have x € 4>{x2) and X2 G </'(a;i), for some xi e Xi, X2 G X2- We cannot 
have a; = because, if so, then (j){x) — 0(2^2), which implies x = X2, since our 
closure system is reduced. This would contradict to the choice of x. 

Thus, X ^ xi, and x G (j){xi). But then we can reduce Xi to X' = Xi \ x C Xi, 
which is still a member of [X] since Xi <C X' C Xi . This contradicts the minimality 
ofXiin[X]. □ 

The second relation we want to introduce in this section is between an element 
X € S and a subset X <Z S, which will be called a cover of x. (In lattice theory, 
the terminology nontrivial join cover is used.) We will write x <\ X, if x € (j^i^) \ 
Ux'ex '^i^')- This notion is illustrated in Example [l] by b<{a,c}. Note that it 
is not true that a < {6, c}, because a < 6, so that a G (t>(b) for the corresponding 
standard closure operator. 

We will call a subset Y Q S a minimal cover of an element a; G S", if K is a cover 
of X, and for every other cover Z of x, Z <^Y implies Y <Z Z. So a minimal cover of 
a; is a cover Y that is minimal with respect to the quasi-order and minimal with 
respect to set containment within its ^<g-equivalence class [Y], as per Lemma [S] 

To illustrate this notion, let us slightly modify Example [T] Rename element by 
d and add a new element: < d, resulting in a lattice Li with J{Li) = J{L)\j{d}. 
We will have Y = {a, c} as a minimal cover for b. Indeed, the only other cover for 
b is Z = {a, c, d}, for which we have Z -^Y and Y C Z. 

Lemma 6. For a reduced closure system, if x < X , then there exists Y such that 
X <iY, Y X and Y is a minimal cover for x. In other words, every cover can be 
-^-reduced to a minimal cover. 

Proof. Consider — {[X] : x<iX}, a sub-poset in the <c poset of '^^-classes. If 
it is not empty, choose a minimal element in this sub-poset, say \Y\, and let Y be 
the unique minimal clement in \Y] with respect to containment, which exists due 
to Lemma [5] Then Y <^ X and x <iY . It remains to show that, for every other 
cover Z oi X, Z <^ Y implies Y C Z. Indeed, since Z <^ Y, we have [Z] [^l- 
But [Y] is the minimal element in P^, hence, [Z] = [Y]. It follows that Y C Z, 
since Y is the minimal element of [Y] with respect to containment order. □ 

We finish this section by introducing the D-basis of a reduced closure system. 

Definition 7. Given a reduced closure system {S,(f>), we define the D-basis S^j as 
a union of two subsets of implications: 

(1) {y X -.x £ <j){y) \y, y £ S}; 

(2) {X X : X is a minimal cover for x}. 

Part (1) in the definition of the Z?-basis will also be called the binary part of the 
basis, due to the fact that both the premise and the conclusion of implications in 
(1) are one-element subsets of S. 

For the closure system {J{L), ip) associated with the lattice L in Example [l] the 
U-basis consists of two implications: b ^ a and {a, c} — > b. 

Lemma 8. S^i generates (5,0). 

Proof. We need to show that, for any x G S and X C S such that x G 4>{X) \ X, 
the implication X ^ x follows from implications in S^j. 
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If x e <j){{x'}), for some x' X, x' ^ x, then X — > a; follows from x' — > x that 
is in E/j. So assume that x ^ for any x' G X. Then a; <i X. According 

to Lemma |6j there exists Y <^ X such that x <¥, and F is a minimal cover for 
X. Then F — >■ x is in E^. Besides, for each y E Y \ X there exists Xy E X such 
that y e (l){{xy}). Therefore, Xy y is in as well. Evidently, X x is a 
consequence of K — >■ a; and {xy y '■ y (z Y}. □ 

5. Comparison of the D-basis and the dependence relation basis 

One of the bases discussed in [5] is the dependence relation basis. For a closure 
system {S, 0), not necessarily reduced, the dependence relation basis is 

J:s ^ {X ^ y : y e (j){X) \ X and y ^ (t){Z) for aU Z C X}. 

Since Z Q X implies Z <^ X, a, minimal cover (as defined above) is automatically 
minimal with respect to containment. Thus we have the following connection. 

Lemma 9. For a reduced closure system, Tijj C E^. 

For later reference, the dependence relation 6 from Monjardet |15j can be de- 
scribed by ySx whenever x € X for some X — >■ ?/ in E^. 

In the next example and in the sequel, whenever there is no confusion, we will 
omit the braces in notations of subsets of some set S: {x}, {a,b, c}, etc. will be 
denoted simply x, abc, etc. 




Figure 1. Example 10 



Example 10. This example is based on Example 5 from [S]. Consider the closure 
system on S = {1, 2, 3, 4, 5} with the set of closed subsets F = {0, 1, 2, 3, 4, 12, 13, 
234, 45, 12345}. Then E^ = {5 ^ 4, 23 ^ 4, 24 ^ 3, 34 ^ 2, 14 ^ 2, 14 ^ 3, 14 ^ 
5, 25 ^ 1, 35 ^ 1, 15 -> 2, 35 ^ 2, 15 ^ 3, 25 3, 123 ^ 5}. 

All implications except 5 —J' 4 are of the form X ^ x, where x < X. On the 
other hand, not all covers X are minimal covers of x. We can check that each of 
implications 15 — 2, 35 — >■ 2, 15 — >■ 3, 25 — >■ 3 does not represent a minimal cover. 
For example, 2 <i 15, but 14 ^ 15 and 2 < 14 is the minimal cover. In particular, 
D-basis consists of all implications from E^ except the four indicated: E^ = {5 — > 
4, 23 ^ 4, 24 ^ 3, 34 ^ 2, 14 ^ 2, 14 ^ 3, 14 ^ 5, 25 ^ 1, 35 ^ 1, 123 5}. 

As this example demonstrates, the D-basis can be obtained from E^ simply by 
removing some unnecessary implications. It turns out that the same can be done 
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for the big range of bases called direct unit bases. Moreover, it can be done in 
polynomial time in the size of the given basis. See Proposition [17] in the next 
section. 



6. Direct basis versus ordered direct basis 

The bases discussed in Bertet and Monjardet |S] are, in general, redundant: a 
proper subset of such a basis would generate the same closure system. For example, 
as we saw in the previous section, from Example 5 was reduced to a smaller basis 



"Ejj- Example 30 shows that the D-basis can also be redundant; see Remark 31 

While the desire to keep the basis as small as possible might be a plausible 
task, there is another property of a basis that could be better appreciated in a 
programming setting. Here we recall the definition of a direct basis. 

If E is some set of implications, then let tt^{X) = X U [J{B : A C X and {A — >■ 
B) e E}. In order to obtain (j)^{X), for any X C S, one would normally need to 
repeat several iterations of tt: (f>{X) — Tr{X) U 7r^(X) U Tr^iX) .... 

The bases for which one can obtain the closure of any set X performing only one 
iteration, i.e., ^(X) = tt{X), are called direct. 

It follows from Theorem 15 of [S] that the dependency relation basis E5 is direct. 
Moreover, this basis is direct-optimal, meaning that no other direct basis for the 
same closure system can be found of smaller total size. (The total size t(S) is the 
sum of the cardinalities of all sets participating in its implications. This will be less 
than s(E) if some sets are repeated.) In particular, any reduction of will cease to 
be direct. Thus, there is a apparent trade-off between the number of implications in 
the basis and the number of iterations one needs to compute the closures of subsets. 

The goal of this section to implement a different approach to the concept of 
iteration. That would allow the same number of programming steps as with the 
iteration of tt, while allowing us to reduce the bases to a smaller size. 

Definition 11. Suppose the set of implications S is equipped with some linear order 
<, or equivalently, the implications are indexed as E = {si, S2, . . . , Sn}. Define a 
mapping ps : P{S) — )■ P{S) associated with this ordering as follows. For any 
X C S, let Xq = X. If Xp; is computed and implication s^+i is A ^ B , then 

^ r XfeUB, if AQXk, 
\ Xk, otherwise. 

Finally, p^{X) — X^. We will call py, an ordered iteration o/E. 



Apparently, 'k-^{X) C p5^(X), because all implications from E are applied to 
the original subset X, while they are applied to potentially bigger subsets X^. in 
the construction for pY,{X). We note though that assuming the order on E is 
established, the number of computational steps to produce p-^{X) is the same as 
for 7rE(X). 

Definition 12. The set of implications with some linear ordering on it, (E. <) , is 
called an ordered direct basis, if, with respect to this ordering, 4>t,{X) — pt,{X) for 
all X CS. 

Our next goal is to demonstrate that E^ is, in fact, an ordered direct basis. 
Moreover, it does not take much computational effort to impose a proper ordering 
on Ex). 
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Theorem 13. Let be the D-basis for a reduced closure system. Let < be any 
linear ordering on Yijj such that all implications of the form y ^ z precede all 
implications of the form X x, where X is a minimal cover of x. Then, with 
respect to this ordering, Y,d is an ordered direct basis. 

Proof. Suppose that X Q S and 6 G 4>{X) \ X. We want to show that b wiU appear 
m one of the Xk m the sequence that leads to p{X). 

If 6 S (j){{a}) \ {a} for some a S X, then b will appear in some Xk^ when a ^ b 
from S^) is applied. So now assume that b ^ (j){{a}) for every a G X. Then b< X 
and, according to Lemma|6) there exists Y ^ X such that b<iY and F is a minimal 
cover for y. It follows that for any y G Y there exists a G X such that y e (/)(a). 
All implications a ^ y will be applied prior to any application with the minimal 
cover. It follows that by the time the implication Sk, say Y ^ b, is tested against 
Xk~i, we will have Y C Xk-i- Hence, Xk = Xk-i U {&}. □ 

Corollary 14. D-basis is also ordered direct in its aggregated form. 

Indeed, it follows from the fact that the only restriction on the order of the 
_D-basis is to have its binary part prior to the rest of the basis. 

Corollary 15. // T,]j = {si,...,s„i} is the D-basis of a reduced implicational 
system E, then it requires time 0{m) to turn it into an ordered direct basis ofT,. 

Example 16. Consider the closure system with S* = {1, 2, 3, 4, 5, 6} and the family 
of closed sets F = {1, 12, 13, 4, 45, 134, 136, 1362, 1346, 13456, 123456}. Then the 
D-basis of this system is = {5 ^ 4,14 ^ 3, 23 ^ 6, 6 ^ 3, 15 ^ 6,24 ^■ 
6, 24 — > 5, 3 — > 1,2 —> 1}. According to Theorem \l3[ a proper ordering that turns 
this basis into ordered direct can be defined, for example, as: (1) h ^ A, (2) Q i, 
(3)?,^ I, (4)2^ I, (5) 14 ^ 3, (6) 23 ^ 6, (7) 15 ^ 6, (8) 24 ^ 6, (9) 24 ^ 5. 




Figure 2. Example 
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7. Processing of ordered basis versus forward chaining algorithm 

The forward chaining algorithm was originally introduced in 1984 by W. Dowling 
and J.H. Gallier in the context of checking the satisfiability of Horn formulae [S]. 
In 1992, H. Mannila and K.J. Raiha introduced the LINCLOSURE algorithm, 
which applies the same approach to expanding functional dependencies in Database 
Systems. In this section, we will look at the efhciency of this approach in comparison 
with a folklore algorithm to computing closures for the _D-Basis, an approach that 
can be generalized to any direct or ordered basis. 

We will assume that the base set is S" = {xi, . . . , a;„}, which can be interpreted as 
propositional variables, and the closure system is given by a unit basis E = {Ai — > 
&i, . . . ,A„i brn}- 

The forward chaining procedure requires a pre-processing setup, during which 
it constructs three data structures: ClauseListi — \Aj : Xi € Aj} for i ^ n, 
Propositions j ~ \Aj\ and Consequent j — {bj} for j ^ m, along with subset True 
C 5* thought of as an input set, whose closure needs to be computed. 

When forward chaining computes the closure of True, for each new Xi € True, for 
each Aj e ClauseListi, it decrements the value of Propositions j by one. Whenever 
Propositions j = 0, Consequent j is added to the set True. 

Since every entry of Propositions will, in the worst case, be reduced to zero, 
the number of steps in computing the closure is bounded by the size s(S), i.e., the 
combined length of the implications in the basis. Including the pre-processing steps, 
the forward chaining algorithm should require 0(s(S)) operations to compute the 
closure. If the closures of multiple sets are to be performed, of course, the setup 
steps can be abbreviated: only Propositions and True need to be updated for 
subsequent runs. 

As noted in ^18) . forward chaining, while efficient in the worst case, generally 
underperforms the folklore algorithm of simply checking if each Aj is contained 
within True, and if so appending bj to True, until the ability of the algorithm to 
generate new True elements is exhausted. In particular, forward chaining does 
poorly on large sets, where we often only need to examine a fraction of the \Y,\ 
variables examined in the forward chaining procedure. 

As an alternative to the forward chaining procedure, M. Wild [18] suggested an 
algorithm that considers the set difference S' = S \ {A^. b^ : A^ % True}. 
For each {Aj — >■ bj) G S' it then adds bj to True and repeats as necessary. This 
algorithm retains the need for preprocessing in the form of ClauseLists. Though 
typically faster than forward chaining, Wild's algorithm has a worst-case running 
time of 0(s(S)m^), which can cause problems for large values of m. 

Applying the folklore algorithm to processing ordered bases, theoretically, avoids 
the pitfalls of both forward chaining and Wild's algorithm. It simply iterates from 
[Ai — )■ 61) to {Am — > bm) adding bi to True whenever Ai C True. On one hand, 
its worst case processing time is 0(s(S)) since we only need to iterate through the 
ordered basis once. At the same time, it takes a fraction of the time of the folklore 
approach on an non-ordered basis, which will require a minimum of two iterations 
in order to confirm that no new variables were added to True. 

In testing the performance of these three algorithms, we generated D-bases from 
the domains {1, 2, 3, 4, 5} through {1, 2, 3, 4, 5, 6, 7, 8} and calculated the time nec- 
essary to derive the closure of some random subset of the set. For forward chaining 
and Wild's algorithm, which require substantial preprocessing, we calculated the 
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Average Time (/xs) 


Domain 5 


Domain 6 


Domain 7 


Domain 8 


Folklore 


3.87 


6.59 


10.12 


14.90 


Forward Chaining (preprocessed) 


7.74 


11.32 


15.99 


21.58 


Forward Chaining 


30.34 


46.10 


66.74 


91.48 


Wild's Algorithm (preprocessed) 


10.14 


14.51 


19.66 


26.16 


Wild's Algorithm 


23.75 


35.28 


50.11 


69.58 



Table 1. Comparing Algorithm Processing Times 



time with and without the preprocessing, which corresponds to the time required 
for computing the first closure on a basis versus the time for subsequent closures. 

In our testing, the folklore algorithm considerably outperformed both forward 
chaining and Wild's algorithm (without preprocessing) though its advantage fell, 
as the domain grew larger. For a domain of size 5, forward chaining and Wild's al- 
gorithm took 2 and 2.66 times as long as folklore, respectively, which shrank to 1.45 
and 1.76 as the domain grew to size 8. If we include preprocessing times, however, 
both algorithms continued to take over 4 times as long, with the relative time of 
forward chaining remaining relatively constant. Domains of sizes 5-8 corresponded 
to bases with an average of 8, 13, 19 and 27 implication, respectively. 

Taking into an account that LINCLOSURE and Wild's algorithm are normally 
performed on the agregated bases, we also ran a series of similar tests with the 
aggregated bases. Such a test is based on the fact that the _D-basis is ordered 
direct in both the unit and the aggreagted form. The bases on domains of sizes 
5-8 had an overage of 5,7,10 and 13 implications, respectively. The test showed 
even higher ratios of forward chaining (without preprocessing) times to the ordered 
direct processing times: from 3.04 for domain 5 to 2.1 for domain 8. 

Noticeably, the ordered-basis approach does not actually require the represen- 
tation of propositions as = . . . , a;„} and implications as S = {si, . . . , Sm}, 
where each proposition has an associated integer value, necessary for indexing and 
traversing ClauseList, Propositions, and Consequent. Though we can in principle 
take advantage of integer values in constructing our set of true values, we only 
require a set of satisfied propositions. By contrast, to use the forward chaining 
method on a basis without this representation would require significant overhead 
in hashing each proposition to its corresponding integer. 

Additionally, the ordered-basis approach eliminates the need for pre-processing of 
the basis to store it in the form of ClauseList and Consequent. Since the D-basis is 
defined as the union of binary and non-binary sets of implications, which is reflected 
in the algorithm for producing it, we assume all D-bases are properly ordered. This 
is particularly important when the basis may not fit into main memory. Instead 
of having to individually access each ClauseListi when the prepositional variable 
Xi appears in True, the ordered-basis approach allows us to parse the basis in 
conveniently sized pieces. 

There is at least one observation how the idea of the ordered basis may improve 
the performance of the forward chaining algorithm. Indexing the implications ac- 
cording to the proper order of the D-basis, whenever we add a variable to True, 
we may additionally maintain the index j of the implication from which it was 
derived. Then, when we process this variable, we only need to update fc-entries of 
Propositions where k> j, saving us significant processing time for very large sets. 
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8. Building and optimizing the _D-basis 

We consider the _D-basis a good alternative of any direct basis, since it has a 
smaller size than any direct basis and preserves the directness property, under a 
special ordering we define. In this section we consider an effective procedure to 
obtain the D-basis from any given direct basis, also to further optimize its binary 
part or to use the concept of the ordered sequence. The problem of obtaining the 
£)-basis from other non-direct bases is also tackled in [3 . 

As we saw in Lemma [9] and Example |10[ the Z?-basis S^) of any reduced closure 
system is a subset of the direct unit basis S5. The next statement shows that, 
given any direct unit basis, one can extract the _D-basis from it in a polynomial 
time procedure. 

Proposition 17. Let {S, <f)) be a reduced closure system. If the direct unit basis E 
for this system has m implications, and \S\ ~ n, then it requires time 0((nraf') ~ 
0(s(E)^) to build the D-basis Tjjj equivalent to S. 

Proof. Let {S, (j)) be the closure system on set S defined by E. By Lemma |9j 
Tin C E5. According to Theorem 15 of [5], coincides with the canonical iteration- 
free basis introduced by M. Wild in [T7]. Hence, by Corollary 17 of 5 , E^ is 
the smallest basis, with respect to containment, of all direct unit bases of {S,(j)). 
Therefore, E^, C E5 C E. 

It follows that Ex) can simply be extracted from E by removing unnecessary 
implications. This amounts to finding the implications X x, where X will be a 
minimal join cover of x, among the implications of E. 

Note that 0(m) steps will be needed to separate binary implications y ^ x from 
X ^ X, where \X\ > 1. The number of x G S that appear in the consequence of 
implications X ^ x is at most the minimum of m and n. 

For every fixed x, it will take time 0{m) to separate all implications AT — >■ x, 
and the number of such implications is at most m. If Xi — x and X2 x are 
two implications in this set, we can decide in time 0(rnn) whether A"i <^ X2 or 
X G (j){y) for some y G X2. If either holds, X2 — ^ x does not belong to the D-basis. 
To check this, consider the closure systems E.^ C E, i — 1,2 that consist of all 
binary implications of E, in addition to X^ — >■ x. Also, put an order on E^, where 
all the binary implications precede Xi x. Apparently, x is in the closure of X2, 
in the closure system defined on S* by Ei, iff cither Xi ^ X2 or x ^ y for some 
y e X2. 

As pointed out in section [7) computation of the closure of any input set, either 
by the forward chaining algorithm, or by the ordered basis algorithm, is linear in 
the size of the input, which in this case is essentially the size of the binary part of 
S, or 0{n^). 

At the worst case, about O(m^) comparisons have to be made, for different covers 
Xi,X2 of the same element x, to determine the minimal ones. Hence, the overall 
complexity is ©(m^n^) - 0(s(E)2). □ 

It follows from the procedure of Proposition [17] that the D-basis is obtained 
from any direct unit basis by removing implications X ^ x, for which X is not a 
minimal cover of x and \X\ > 1. In particular, the binary part of the direct basis, 
i.e., implications of the form y ^ x, remain in the D-basis. 

We want to discuss a further optimization of the D-basis, as well as any other 
basis that has the same binary part as the D-basis. As was observed in section 2, for 
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a reduced closure system {S,4>), the elements of S can be identified with elements 
of the closure lattice L, in such a way that J{L) C C L. This correspondence 
induces a natural order on 5, with s ^ t if and only if (j){s) C 0(t). Thus, an 
implication y ^ x belongs to the Z3-basis iff a; € 4>{y) iff a; ^ y. The binary part of 
the Z?-basis then describes the partially ordered set (S", ^). 

Recall that, in the language of ordered sets, we say that y covers x ii y > x and 
there is no element z such that y > z > x. 

We can shorten the binary part of the D-basis, leaving only those implications 
y ^ X for which y covers x in (5,^). This will come at the cost of the need to 
order the remaining implications. For example, if a; — )> y, y — )■ z, a: — ^ x are three 
implications from the binary part of some Z?-basis, then the last implication can 
be removed, under condition that the first two will be placed in that particular 
order into the ordered Z?-basis. More generally, suppose only covering pairs are 
to be included in the binary part of an ordered basis. Then the ordering of the 
implications should be such that, ifa;>y>z>iin5 with the strict inequalities 
being covers, then x y precedes z t. 

Recall also that if some set of implications E' is ordered, then p^' (X), the ordered 
iteration of E', is defined for every X C S, see Definition 11 

Proposition 18. Let Ei be the binary part of the D-basis of a reduced closure 
system on a set S. If Ei has k implications and \S\ — then there is an 0{nk+n^) 
time algorithm that extracts E' C Ei describing the cover relation of join irreducible 
elements of closure system, and places the implications of E' into a proper order. 
Under this order, pT,'{y) — PSi(y) for every y £ S. 

Proof. We have the partially ordered set (S", ^) of size n, whose cover relation has 
at most k pairs, thus, it will take time 0{nk + n^) to find the cover relation of this 
poset, see [10], also Theorem 11.3 in |9]. Let S' C Ei be the set of all implications 
y — >■ a:, where y covers x in {S, ^) . It remains to put these implications into a 
proper order. If (S*, <i) is any linear extension of (-S, <), then one can take any 
order of E' associated with this extension. Starting from the maximal element 
y of (S', ^i), write all implications y — > a; from E', in any order, then pick next 
to maximal element z of (S", ^i) and write all implications z t, in any order, 
then proceed with all elements of (5', ^i) in the same manner, in descending order 
>i. It remains to notice that there is an 0(n + k) algorithm for producing the 
linear extension of partially ordered set with n elements and k pairs of comparable 
elements, see Theorem 11.1 in [5]. □ 

Now we want to deviate slightly from the notion of ordered direct basis to the 
notion of ordered direct sequence of implications. Suppose E is some basis of a 
closure system (5, (p). The ordered sequence a = (si, . . . , St) of implications from E, 
not all necessarily different, is called an ordered direct sequence from E, if Pa{X) — 
(j){X) for every XCS. 

The idea of ordered direct sequencing allows some further optimization of the 
Z)-basis. li Z = (zi, . . . , Zk) and T = (^i, . . . , t^) are two ordered sequences, then 
Z'~'T denotes their concatenation (the attachment of T at the end of Z). 

Lemma 19. Suppose a = E^E^Es is an ordered direct sequence from some basis 
E, where Ei, E3 consist of binary implications in proper order of Proposition^!^ E2 
consists of non-binary implications, and E2 can be put into arbitrary order without 



ORDERED DIRECT IMPLICATIONAL BASIS OF A FINITE CLOSURE SYSTEM 15 



changing the ordered direct status. If {A — > y), (A — > a;) G S2 and (y — > x) G Si, 
then A ^ X can be dropped from S2 and replaced by an additional y ^ x in E3. 

Proof. We need to show that whenever Y is an input set such that x e (t>{Y), the 
replacement of A ^ x by y ^ x wih not affect computation of pa(Y). 

Consider the case when y ^ (f'O^)- Then also A % </>(5^), whence any implication 
with the premise A will never be applied in computation of p^{Y). The same is true 
for implications with premise y, so replacement of A — )■ a; by y — )■ x can trivially be 
done. 

Now suppose that y £ (f>(y). By assumption, we can take A — > x to be the last 
implication in the ordering of S2- So consider Yk, the result of ordered iteration 
of SJ"S2 \ (A x) on the input set Y. li y G Yk, then we can drop A — >■ a; from 
S2 and place y ^ x anywhere in proper order in E3, which will guarantee that 
X appears in p(Y). If y ^ Yk, then there is z G Yk such that there exists some 
sequence in E3 from z to y. By assumption, E3 is in the proper order, hence any 
implication w ^ y precedes x t. Thus, we can place y — > a; in between those 
groups, following the proper order on all binary implications from Proposition |18| 
After replacing A — > a; by y — a; in proper position of E3, we can still assume that 
the ordering of remaining part of S2 can be arbitrary. □ 

Corollary 20. Suppose is the D-basis of some closure system. Consider C 
obtained from Tijj by performing the following reductions: 

(a) Remove A x, if A ^ y and y — ^ a; are also in Yijj. 

(b) Remove z x, if z y and y — )■ a; are also in Yijj. 

Let Tii be a the proper ordering of binary part of given in Proposition 
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let E3 be a subordering of this proper ordering on implications y — > a; that appear 
in triples of A — x,A y,y — )■ x of (a). Finally, let E2 be some ordering of 
non-binary implications o/EJ. Then a ~ EJ^E^Es is the ordered direct sequence 
for the basis E^. In particular, the length of this sequence is no longer than the 
length of the D-basis. 

Proof. Indeed, following the procedure of Lemma[l9]we can replace all A ^ x from 
the triples A — >■ a;, A — )■ y, y — )■ a; in E^j by the second copy of y — )■ a; in additional 
binary part E3 that follows the non-binary part of the Z?-basis. □ 

Example 21. Given the D-basis of the closure system: E/j = (3 — )• 2, 2 — )• 1, 3 — > 
1,45 — >■ 3,45 — > 2,45 — > 1), we can produce a shorter basis E^ — {3 — > 2,2 — > 
1, 45 — ?> 3} with the ordered direct sequence: ct = (3 — > 2, 2 — > 1, 45 — > 3, 3 — > 2, 2 — > 
1). We note that E^ is only half as long as Ec and its ordered direct sequence 
a has the same length as D-basis with optimized binary part but the size of a is 
smaller than that of the optimized D-basis. 



9. Closure systems without £>-cycles and the i?-BASis 

It turns out that the Z?-basis can be further reduced, when an additional property 
holds in a closure system {S,(j)). The results of this section follow closely the 
exposition given in [9], section 2.4. 

We will write a:Dy, for x,y € S, if y €Y for some minimal cover Y of x. We 
note that the D-relation is a subset of the dependence relation S from section [5j 
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Definition 22. A sequence xi,X2, ■ ■ ■ ,Xn, where n > I, is called a ZJ-cycle, if 
X1DX2D . . . XnDxi- A finite closure system {S, <j}) is said to be without ZJ-cycles if 
it has no D-cycles. 

We note that the lattices of closed sets of closure systems without _D-cycles are 
known in lattice-theoretical literature as lower hounded. 

For every x € S, let M{x) = {Y C :Y is a minimal cover of x}. The family 
4>{M{x)) = {(t>{Y) : Y £ M{x)} is ordered by set containment, so we can consider 
its minimal elements. Let M*{x) = {Y € M{x) : (j){Y) is minimal in (l){M{x))}. 

We will write xEy, for x,y E S, ii y E Y for some Y E M*{x). According to the 
definition, if xEy then xDy. On the other hand, the converse is not always true. 

Example 23. Consider the closure system and its D -basis from Example \l()[ We 
note that this closure system has no D-cycles. We have three minimal covers of6: 
15, 24 and 23. Since (^(15) = 5 \ 2, (j){24) = S and (/)(23) = 5" \ 45, we have only 
two of these covers in M*{6): 15 and 23. Thus, while QDA, we do not have GEA. 

We now define two sequences of subsets of S, based on covers from M{x) and 
M*{x), correspondingly. 

Let Dq = Eq — {p E S : p E 4>{pi, ■ . . ,Pk) implies p E 4>{pi) for some i ^ fc}. If 
Dk and E^ are defined, then Dk+i — £'fcU{s € : if s<iY then s<iZ for some Z C 
Dfe, Z < y and Z E M{s)}. Similarly, E^+i = Ek\J {s E S : if s <: y then s < 
Z for some Z (- Ek,Z <^Y and Z E M*{s)). Apparently, E^ <Z D^, for any k. 
The following result is proved in [9 , Theorem 2.51. 

Lemma 24. // (S', (j)) is a reduced closure system without D-cycles, then, for some 
k, S ^ Ek ^ Dk. 

As a consequence, we can often shorten the Z3-basis for a closure system without 
D-cycles. We will say that s E S has Z3-rank k = 0, if s E Dq, and fc > 0, if 



s E Dk \ -Dfc-i- According to Lemma 24 every s G S" in a closure system without 
_D-cycles has a D-rank. 

Recall that a basis is called aggregated when all its premises are different. Ev- 
ery basis can be brought to the aggregated form by combining conclusions of all 
implications with the same premises. 

Theorem 25. Let {S, 4>) he a reduced closure system without D-cycles. Consider 
a suhset Tie of the D-hasis that is the union of two sets of implications: 

(1) {y^x:xE ^(y)}, 

(2) {X ^x:X E M*{x)}. 
Then 

(a) YiE is a basis for {S,4>). 

(b) Te is ordered direct. 

(c) The aggregated form of Y,e is ordered direct. 

Proof. To begin with, it is not true that every cover of an element x E S refines 
to a cover in M*{x), so must be ordered more carefully than S^j. Nonetheless, 
mimicking the proof of Theorem 2.50 of [HI, we can construct an order on Eg that 
makes it an ordered direct basis. This will be done for the aggregated i5-basis, 
proving parts (a) and (c) simultaneously; part (b) then follows. 

Consider the aggregated form of E^;. Given an implication X — > y in this basis, 
let D*{X Y) be the maximal D-rank of elements in X, and D^{X — > Y) be the 
minimal D-rank of elements in Y. Then D*{X Y) < D^{X ^ Y). 
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Order the implications following the rule: put the implications x ^ Y first 
(aggregated form of binary part of Sg), and for the rest, if D*{Xi Yi) < 
D*{X2 Y2) then Xi — > Yi precedes X2 — > in the order. 

Claim. If Xi — >■ Yi and X2 — )■ Y2 are in the aggregated E-basis, and Yi H X2 =/= 
then Xi — > Yi precedes X2 — > Y2 . 

Indeed, take any x E YiD X2- If the D-rank of x is k, then D* (X2 — > Y2) > k > 
D^,{Xi ^ Yi) > D*{Xi Yi). Hence, Xi Yi wiU appear in the order before 
X2 ^ Y2. 

Now take any input set Z. We want to show that (t>{Z) can be obtained when 
applying the aggregated basis in the described order. We argue by induction on 
the rank of an element z € 4'{Z) \ Z. 

If z S Dq, then it only can be obtained via some implication a: — > Y, for some 
X € Z, and z G Y, and implications a; — > Y form an initial segment in the ordered 
sequence of the basis. Now assume that it is already proved that all elements of 
\ Z of rank at most k can be obtained in some initial segment of the sequence 
for the basis. If we have now element z of rank k + I, then it can be obtained 
via an implication X ^ Y with X C 0(Z), z e Y, and D*{X) < k + I. By the 
induction hypothesis, all elements in X C (j)[Z)\Z can be obtained via implications 
located in some initial segment of the sequence, and by the Claim above, all those 
implications precede X — )■ Y. Thus, all implications producing elements of rank 
fc + 1 from (t){Z) will be located after the segment of the sequence producing all 
rank k elements. □ 

To illustrate the ordering of an E'-basis, consider again the closure system given in 
Example 1 16[ As we know from Example |23[ Y^e exists and includes all implications 
of the ZJ-basis, except 24 — > 6. Elements 1,2,4 have I?-rank 0; elements 3,5 
have _D-rank 1, and _D-rank of 6 is 2. This allows to impose a proper ordering on 
implications of that turns it into ordered direct: 

(1) 5 4, (2) 6 ^ 3, (3) 3 ^ 1, (4) 2 ^ 1, (5) 14 ^ 3, (6) 24 ^ 5, (7) 23 -> 6, (8) 
15 6. This basis is also aggregated. 

Proposition 26. Suppose Y^d — {si, S2, • ■ • , Sn} is a D-basis of some reduced 
closure system {S,(j)) and \S\ = m. It requires time 0{mn^) to determine whether 
the closure system is without D- cycles, and if it is, to build its ordered direct basis 
Ye. 

Proof. Since the Z3-relation is a subset of S'^, it will contain at most pairs. On 
the other hand, it is built from implications X ^ x, so the other upper bound for 
pairs in il'-relation is mn. Evidently, the closure system is without Z?-cycles iff its 
D-relation can be extended to a linear order. There exists an algorithm that can 
decide whether {S, D) can be extended to a partial order on S in time 0(m + \D\), 
see Theorem 11.1 in [9|. We will see below that the rest of the algorithm will take 
time O(mn^), which makes the total time also O(mn^). 

Assuming the first part of algorithm provides a positive answer and there are no 
D-cycles, we proceed by finding the ranks of all elements. It will take at most n 
operations to find set Dq: include p into Z?Oi if it does not appear as a conclusion 
in any (non-binary) implication X ^ x oi the D-basis, where x <\X. If the system 
is without D-cycles, then 7rs(I?o) \ Dq gives elements of rank 1, 7r|(Z)o) \ ''■s(-Do) 
elements of rank 2, etc. Note that ■ky,{X) is defined in the beginning of section 
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6. Computation of ^^{X) requires time 0{mn)^ since E = Ejj in our case has n 
implications, and checking that the premise of each imphcation is a subset of X 
takes time 0{m). After at most m iterations of tt on Dq, one would obtain the 
whole S, whence, 0{m?n) operations are needed to obtain the ranks of all elements 
from S. Assuming that m ^ n in most closure systems, this time will not beat 
O(mn^). 

It remains to decide which implications from the I?-basis should remain in the 
i?-basis. To that end, for each element x Cz S we need to compare the closures 
4>{X) of subsets X, for which X ^ x is in the D-basis, and choose for the i?-basis 
those that are minimal. There is at most n implications X x, for a given x € S, 
and the closure (f>{X), for each such X, can be found in 0(s(Ei3)) steps. It will 
take time 0{n^) to determine all minimal subsets among 0{n) given subsets (t>{X), 
associated with fixed x Q S. Hence, it will require time 0{mn^) for all x Q S. 

The size of the i?-basis will be at most n, and it will take time O(n^) to order it 



with respect to the rank of elements, per Theorem 25 □ 



When a closure system has Z?-cycles, the subset of S/j, defined in Corollary 
|25[ may not form a basis. 

Example 27. Consider S — {1,2,3,4} and a closure operator defined by the D- 
basis 

13-^2,24^3,14^2,14-^3. 
This closure system has the cycle 2D3D2. It is easy to verify that "Ee has only 
13—7-2 and 24 — > 3, so the last two implications from the D-hasis cannot be 
recovered from E e ■ 




Figure 3. Example 27 



Further results about closure systems without D-cycles, and more generally sys- 
tems whose closure lattice is join semidistributive, will be presented in 

10. D-BASIS VERSUS DUQUENNE-GUIGUES CANONICAL BASIS 

We recall the definition of the canonical basis introduced by V. Duquenne and 
J.L. Guigues in |llj, see also This applies to arbitrary closure systems, not just 
reduced ones. 
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Definition 28. The canonical basis of a closure system {S, (p) consists of implica- 
tions X — > y for X,Y Q S , that satisfy the following properties: 

(1) X C - Y; 

(2) for any <j)-closed set Z , either X Q Z or Z C] X is (p-closed; 

(3) ifW CX, (j){W) = Y andW satisfies (2) in place of X, then W = X. 

The subsets X <Z S with properties (1) and (2) are usually called quasi-closed^ see 
[4] . The meaning of (2) is that adding X to the family of closed sets of produces 
the family of closed sets of another closure operator. Property (3) indicates that 
among all quasi-closed subsets with the same closure one needs to choose the min- 
imal ones. This basis is called canonical, since it is minimal, in that no implication 
can be removed from it without altering and every other minimal implicational 
basis for (p can be obtained from it. In particular, no other basis can have a smaller 
number of implications. Note that here the implications are of the form X ^ Y , 
where Y is not necessarily a one-element set. We will also call it the D-G basis, to 
distinguish from canonical unit direct basis. 

To bring this basis in comparison with other bases discussed in this paper, each 
implication X Y may be replaced by set of implications X~^y,yEY\X. We 
will call this modification of the canonical basis the unit D-G basis. 

In many cases the canonical basis may be turned into an ordered direct basis. 

Example 29. Gonsider again the closure system from Example 16 The canonical 
basis is 

2 ^ 1, 3 ^ 1, 5 ^ 4, 6 ^ 3, 6 ^ 1, 14 ^ 3, 123 ^ 6, 1345 ^ 6, 12346 ^ 5. 

Besides, it is ordered direct in the given order. 

In general, though, the canonical basis cannot be ordered so that it becomes 
direct. Thus, it is not ordered direct. The following two examples were uncovered 
by running a computer program and checking about a million of various closure 
systems on 5- and 6-element sets. The first example demonstrates a closure system, 
where the canonical basis cannot be ordered, while the unit expansion of this basis 
does admit an ordering to make it direct. The second example shows that some 
canonical bases cannot be ordered in either form. 

Example 30. 

Let {S, 4>) be a closure system on 5 = {1,2, 3, 4, 5, 6}, given by the family of closed 
sets: {0, 1, 2, 3, 4, 6, 36, 26, 13, 24, 14, 35, 23, 16, 135, 136, 236, 1246, 2345, S}. The lat- 
tice representation of this system is given in Figure |4j 

Then the canonical basis is 5 -> 3, 34 -> 25, 12 46, 46 ^ 12, 235 ^ 4, 356 ^ 
124. It is easy to show that this basis cannot be ordered. Indeed, in order to obtain 
(/){145) = S* in one application of canonical basis, one would need to put 5 — > 3 first, 
then 34 25, followed by 12 46. On the other hand, (/)(123) = S, too, and the 
only implication applicable to 123 is 12 46, but it comes after 34 -H- 25, and one 
cannot obtain 5 in the closure otherwise. 

As was mentioned, the unit expansion of this canonical basis is still ordered 
direct: one would need to place implications 12 — 4 and 12 — 6 around 34 2 
and 34 ^ 5, thusly: 5 ^ 3, 12 ^ 4, 34 ^ 2, 34 -> 5, 12 ^ 6, 46 ^ 2, 46 ^ 1, 235 ^ 
4,356^ 1,356 ^ 2,356 ^ 4. 
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Figure 4. Example [30| 

As always, the D-basis is ordered direct in both forms: in its original unit form, 
and in the aggregated form. For example, the aggregated form of D-basis in this 
example is 5 ^ 3, 34 25, 12 46, 46 ^ 12, 25 ^ 4, 56 ^ 124, 123 ^ 5, 134 ^ 6. 

One needs to run the canonical basis two times to ensure the closure of arbitrary 
subset, i.e., apply 6-2 = 12 implications, versus only 8 implications of the aggregated 
D-basis. In the unit form, the canonical basis has 11 implications and the D-basis 
has 13, but the ordering of the canonical basis requires special care. 

Remark 31. 

Example [30] also shows that the D-basis, unlike the canonical basis, can be 
redundant (even in its aggregated form): this means that some implications can 
be removed, and the remaining ones still define the same closure system. In the 
D-basis of our example, both implications 123 5, 134 — > 6 can be removed, since 
they follow from 34 — )■ 25, 12 — 46. On the other hand, the basis without these 
two implications is no longer ordered direct. 

The following example shows that the canonical basis might be un-orderable in 
either form. 

Example 32. 

Let (5*, 0) be a closure system on 5* = {1,2,3,4,5,6}, given by the family of 
closed sets: {0, 1, 2, 3, 5, 6, 12, 13, 14, 16, 23, 123, 124, 135, 256, 1346, S}. The lattice 
representation of this system is given in Figure [5] 

The canonical basis has 9 implications: 
4 ^ 1, 15 ^ 3, 35 ^ 1, 25 ^ 6, 56 2, 26 ^ 5, 36 14, 134 ^ 6, 146 ^ 3. 
There is a single implication 36 — > 14 that can be expanded to two unit implications 
36 1 and 36 ^ 4. 

The proof that the unit expansion of canonical basis cannot be ordered to make 
it direct, follows from consideration of the next three closures: 

• 45 145 1345 -> 13456 S, hence, 134 6 should be placed later 
than 15 — > 3. 

• 1234 12346 S, hence 26 5 should be placed later than 134 6. 
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Figure 5. Example [32] 



• 126 — >■ 1256 — >■ 12356 — >■ 5, hence 15 — >■ 3 should be placed later than 
26 — >■ 5, which contradicts the combination of the previous two items. 

For comparison, the aggregated D-basis has 15 implications: 
4 ^ 1,45 ^ 26,36 14,34 ^ 6, 15 -> 3,46 -> 3, 35 -> 1,25 ^ 6, 26 ^ 5,56 ^ 
2, 126 -> 34, 235 ^ 4, 156 ^ 4, 234 -> 5, 125 4. 

Thus, one run of the aggregated I?-basis (15 implications) wins over two runs (18 
implications) of the canonical basis. In unit expansions: Z?-basis (18 implications) 
still wins over two runs (20) of canonical basis. 

In this example, the Z)-basis is 4 implications shorter than the canonical unit 
direct basis, which has 22 implications. 



Our earlier analysis of the binomial part of the Z?-basis in Proposition 18 carries 
over to a partial optimization of the canonical basis. 

Proposition 33. The binary part of the unit expansion of the D-G canonical basis 
of any reduced closure system coincides with the binary part of the D-basis (or, 
E -basis, if it exists) of the same system. 

Proof. We recall that the binary part of the D-basis of closure system {S, 4>) consists 
of implications y ^ x, where x G 4>{y) \ y. This implies that {y} is not a 0-closed 
set. Besides, it is a quasi-closed set, since the intersection of {y} with any (^-closed 
set is either {y} or 0. Evidently, {y} will be the minimum quasi-closed set with 
the closure (f>{y). Hence, {y} — >■ </>(?/) \ y should be an implication in the canonical 
basis. Evidently, the unit expansion of {y} — >■ </>(?/) \ y gives all the implications 
in the D-basis with the premise y. Vice versa, every implication in the canonical 
basis of the form y ^ Y implies that Y — (j){y) \ y. Hence, y ^ y' for y' E Y should 
appear in the D-basis. □ 

The following statement is an immediate consequence of Proposition [18] and 



Proposition 33 We recall that L stands for the lattice of closed sets of (5, </>), and 
( J(L), <) is a partially ordered set of join-irreducible elements of L. 

Corollary 34. Let 'Sq be the canonical basis of {S,(f>), where \S\ = m. Let C 
be a binary part of T,c, and let n be the number of implications in the unit 
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expansion ofT,Q. Then an algorithm that requires 0{mn + n^) time will replace 
each implication y Y in 'S^ by y ^ Y' , Y' C Y, where (f>{y) covers (f>{y') in 
{ J{L), for each y' G Y' . Let EJ^ be this new set of implications. The algorithm 
will also put an appropriate order on in such a way that p-^c — Ps' • 



Thus, the optimization of the canonical basis inspired by Proposition 18 
the form of a possible size reduction of some implications. 

We finish this section with a comparison of the canonical D-G basis with the 
D-basis on some illustrative examples. We consider one particular type of closure 
systems for which the description of the canonical basis is easy. The closure system 
{S, (j)) is called a convex geometry, if satisfies the anti-exchange axiom: if a; G 
4>{C\J{y}) and x ^C, then y ^ 0(CU {x}), for all x y in 5' and all closed C C 5. 

For any closed set X in a convex geometry, the set of extreme points of X is 
defined as Ex{X) = {x € X : x ^ 4'{X\x)}. It is well-known that, in every convex 
geometry, X — (f){Ex{X)). The equivalent statement in the framework of lattice 
theory is that every element Y in the closure lattice of a finite convex geometry has 
unique representation as a join of join irreducible elements: Y = \J Yi, so that none 
of Yi can be removed (such representation is called irredundant join decomposition 
of F); see, for example, [2]. 

An important example of convex geometry is Co(i?", A), where A is a finite set 
of points in i?", and Co(i?", A) stands for geometry of convex sets relative to A. In 
other words, the base set of such closure system is A, and closed sets are subsets X 
of A with the property that whenever point a G ^4 is in convex hall of some points 
from X, then a must be in X (see more details of the definition, for example, in 

0)- 

Lemma 35. IfY is the premise of an implication from the canonical basis of some 
convex geometry Co{R^ , A), then Y is the set of extreme points of a closed set (jiiY) 
such that every subset of Y is closed. 

Proof. Evidently, the premise Y of every implication of the canonical basis con- 
tains ex{(t)(Y)). Moreover, Co(i?",A) satisfies the rt-Caratheodory property, see 
[T], which means that |i?a;(0(l"))| ^ n + 1. Suppose there exists an implication 
y — >■ z in canonical basis with Ex{Y) = {yi, 2/2, • ■ • , Vn+i}, z G (f){^) z ^ 4>{y ) 
for every Y' C ex{Y). We claim that every Yi — Y\{yi} is closed. Indeed, suppose 
w.l.o.g. that X G (l){Yn+i) = (j>{yi,...,yn), x ^ {yi,...,y„}. Then simplex gen- 
erated by yi, . . . , yn+i is split into simplices generated by Xi = {x, j/2, • ■ ■ , Un+i}, 
X2 = {?/i,a;,y3,...,7/„+i},..., X„ = {yi, 1/2, • ■ • , 2/n-i, a;, ?/„+i}. Then z must be in 
one of those simphces, say, z G (t){Xi). Since Y is quasi-closed, (f)iYn+i) C (l){Y) im- 
plies a; G and 4>{Xi) C (j^O^) implies z € Y, a, contradiction with the assumption 
z^Y. 

Similar argument applies for any other Y with Ex{4>{Y)) < n 4^ 1. □ 

In the next two examples, we consider convex geometries of the form Co{R^ , A) 
and compare the canonical bases and D-bases. 

Example 36. If A is a set of points in general position, i.e., no three points are 
on a line, then the D-basis and canonical basis of convex geometry Co{R^ , A) are 
the same. 
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Due to the 3-Caratheodory property, all covers can be reduced to covers by three 
elements. So the Z)-basis consists of implications abc — >■ x, for all triangles abc that 
have X inside. 

Now, for any relatively convex subset X C A, Ex{X) consists of the vertices of a 
convex polygon that holds all the points oi X inside. If there exists y £ X\Ex(X), 
then there are a^b,c £ X with y £ 4'{a, b, c). Hence, {a, 6, c} C X is not closed. This 



will not contradict Lemma 35 only ii X = {a, 6, c}. Thus, the only implications in 
the canonical basis are abc — ^ x, where x is inside triangle abc. It follows that the 
D-basis and canonical basis are the same. 

Example 37. If A is a set of points that is not in general position, then the 
canonical basis of Co{R?',A) is a proper subset of the D-basis. 

Indeed, consider a point configuration of 5 points: a, 6, c form a triangle, x is 
inside the triangle, and d is on the side ab, so that x is also inside triangle dbc. The 
_D-basis is ab — >■ d, abc — ?> x, bed — >■ x, while the canonical basis is ab — > d, bed — > x. 

Note that abc cannot be a premise of an implication in the canonical basis due 
to Lemma [28) since the subset ab is not closed. 

We note that Lemma [35] is not true for arbitrary convex geometries. 
Example 38. 

Take convex geometry ({a, 6, c, d, x}, </>) of Example [37[ Adding another closed 
set {b, c, d} will result in a new convex geometry ({a, b, c, d, x}, ip) with the canonical 
basis ab — d,abcd — >■ x. Note that in implication abed — > a;, d ^ Ex{abcd), and 
subset ab is not closed. 



11. Testing the performance of D-basis 

The performance of D-basis in comparison with the D-G unit basis and canonical 
unit direct basis was tested on 300, 000 randomly generated closure systems on base 
sets of 6 and 7 elements. The closed sets in these systems were generated by taking 
3 to 8 arbitrary subsets of the domain, the intersection of all combinations of these 
sets, the empty set, and the domain itself. 

The computation of the closure of random input set X was implemented, for 
the D-G unit basis, according to the folklore algorithm, which essentially makes 
the computation of tt{X), tt'^{X), etc., on its consecutive loops. This algorithm is 
presented, for example, as Algorithm in section 2 of [18], also see our discussion 
of this algorithm in comparison with the forward chaining algorithm in section [7] 
Based on this algorithm, computing the closure of an input set using the D-G basis 
will always take at least two passes: the final pass produces nothing and exists 
solely to determine that the ability of the basis to expand the given set has been 
exhausted. 

In contrast, the computation of the closure of any input set, by the D-basis or 
canonical unit direct basis, is done simply in one loop of such algorithm. The data 
collected reflects the number of implications attended in the run of each algorithm. 
Thus, with respect to these two bases, it makes a comparison of their length. 

In the testing on domain length 6, with inputs sets of length 3, the D-G unit 
basis cycled through, on average, 22.9 implications before returning the closure. 
By comparison, the direct canonical (optimal) basis took 15.8 such steps and the 
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Closed Sets 


D-G Unit Basis 


Direct Optimal Basis 


D-Basis 


5 


32.37 


18.71 


16.45 


10 


22.59 


16.24 


12.23 


15 


21.22 


15.54 


12.47 


20 


18.13 


13.06 


11.45 


25 


15.54 


11.09 


10.34 


30 


11.70 


7.96 


7.65 



Table 2. Average implications checked to expand an arbitrary 
3-element set in a length 6 domain 



_D-basis took only 12.7 checks on average. Due to their ordered directness, the 
number of implications checked in the direct optimal and ZJ-basis was equivalent 
to the number of implications they contained. 

It was observed that the efficiency gap between the direct and indirect bases was 
greatest when there were fewer closed sets, meaning that more subsets could be 
expanded through the bases' implications. This relation is shown in Figure |6] 
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Figure 6. Implications checked (Y-Axis) by the number of closed 
sets in the basis (X-axis) for a domain of size 6. 

We saw similar results on bases of domain length 7. There, we once again saw 
the convergence of the I?-Basis and direct optimal as the number of closed sets 
approached either extreme, with a more pronounced gap in between. 



ORDERED DIRECT IMPLICATIONAL BASIS OF A FINITE CLOSURE SYSTEM 25 



Closed Sets 


D-G Unit Basis 


Direct Optimal Basis 


D-Basis 


5 


46.73 


27.70 


23.57 


10 


33.74 


26.26 


17.92 


15 


32.11 


26.80 


18.59 


20 


31.01 


25.68 


19.43 


25 


29.77 


23.99 


19.66 


30 


26.71 


20.64 


17.73 



Table 3. Average implications checked to expand an arbitrary 



3-element set in a length 7 domain. 



There were 33.8 checks on average for the D-G unit basis, and 26.0 and 19.0 for 
the direct optimal and _D-basis, respectively, see Table [3j 

60 




10 





4 5 6 7 3 9 10111213141516171819 202122232425 26272329 30 

— — D-^: Unit Basis Direct Opt im ai Basis D-Basis 

Figure 7. Implications checked (Y-Axis) by the number of closed 
sets in the basis (X-axis) for a domain of size 7. 
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